-
Notifications
You must be signed in to change notification settings - Fork 70
fix: allow PartitionField's field_id to be missing in Iceberg v1 #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Smith Cruise <[email protected]>
src/iceberg/json_internal.cc
Outdated
| std::vector<PartitionField> fields; | ||
| for (const auto& entry_json : partition_spec_json) { | ||
| ICEBERG_ASSIGN_OR_RAISE(auto field, PartitionFieldFromJson(entry_json)); | ||
| ICEBERG_ASSIGN_OR_RAISE(auto field, PartitionFieldFromJson(entry_json, true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( can we use something like code below?)
ICEBERG_ASSIGN_OR_RAISE(auto field, PartitionFieldFromJson(entry_json, /*allow_field_id_missing=*/format_version==1));|
I think we should improve this in another way. I don't think we should set it to iceberg-cpp/src/iceberg/json_internal.cc Line 675 in ed49d1e
Instead, I would follow the same pattern as in Java. Where we assign them incrementally from 1000: This is also suggested by the spec:
In V1, it is not allowed to drop or re-order fields, so this will always be consistent:
|
|
Your idea is correct. I think I can change signature to if next_partition_field_id has value, we assign field_id from |
mapleFU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I think the logic with PartitionFieldFromJson(json, /*allow_field_id_missing=*/true) and current logic is equal, but this LGTM. Also cc @wgtmac
Just ensure that |
|
I believe @Fokko's idea has already been implemented as in https://github.com/apache/iceberg-cpp/blob/main/src/iceberg/json_internal.cc#L1058-L1061. Actually my intention is that |
Or just using |
|
@wgtmac CI run failed, could you help me to rerun it? |
Nice, that's great to hear! @Smith-Cruise Could you fix the CI? :) |
It just looks like a network issue; I think retrying can fix it. |
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @Smith-Cruise!
Sorry I don't have permission to rerun CIs. You may just push again to re-trigger or wait for @Fokko to rerun it.
BTW, could you modify the PR title to apply conventional commit message? For example, fix: allow PartitionField's field_id to be missing in Iceberg v1
Signed-off-by: Smith Cruise <[email protected]>
|
@Smith-Cruise Thanks for fixing this! 🙌 and thanks for the review @mapleFU and @wgtmac |
…che#121) Just looks like forget to use `allow_field_id_missing` in `PartitionFieldFromJson(const nlohmann::json& json, bool allow_field_id_missing)` In v1 iceberg, we should allow field_id is missing. --------- Signed-off-by: Smith Cruise <[email protected]>


Just looks like forget to use
allow_field_id_missinginPartitionFieldFromJson(const nlohmann::json& json, bool allow_field_id_missing)In v1 iceberg, we should allow field_id is missing.